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O Abstract 

We show that in a particular model of catalytic reaction systems, known as the 
binary polymer model, there is a mathematical invariance between two versions 
of the model: (1) random catalysis and (2) template-based catalysis. In partic- 

p I ular, we derive an analytical calculation that allows us to accurately predict the 

Q (observed) required level of catalysis in one version of the model from that in 

. ^ the other version, for a given probability of having self-sustaining autocatalytic 

sets exist in instances of both model versions. This provides a tractable connec- 
tion between two models that have been investigated in theoretical origin-of-life 

'— ' studies. 
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1. Introduction 

I In previous work, we introduced and analyzed a model of catalytic reaction 

systems (CRS) [111 US E], based on the original model of Kauffman [HI E|. 
. !^ This model consists of molecule types (bit strings up to a given length n) , reac- 

tions (ligation and cleavage), and randomly assigned catalysis, where molecule 
?-H types can catalyze reactions with a certain (given) probability. Such models 

^ were developed and investigated within the context of theoretical origin-of-life 

studies. In this setting, a question of particular interest is the level of catalysis 
required to have a high probability of autocatalytic sets ( "closed" , self-sustaining 
subsets of molecules and reactions) appearing in instances of this random binary 
polymer CRS model. We showed, both theoretically and computationally, that 
this level (in terms of the average number of reactions catalyzed by any one 
molecule) needs only grow linearly with n (the maximum molecule size in the 
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system) to get a high probabiHty of autocatalytic sets [S1II2]- However, in [5], we 
showed that there is still some discrepancy between the theoretically predicted 
and computationally observed required levels of catalysis, and we provided a 
partial explanation for this discrepancy. The theoretical analysis thus provides 
an important (linear) upper bound, but not (yet) an exact prediction. 

In [5], we also analyzed an extension of the basic model, based on initial 
experiments reported in [S], where instead of completely randomly assigned 
catalysis, a molecule has to match the reaction template (a certain number of 
bits around the reaction site) to be able to be a catalyst for that particular 
reaction. For this more realistic version of the CRS model, we also showed 
(again both theoretically and computationally) that a linear growth rate (in 
n) in the level of catalysis suffices to get a high probability of autocatalytic 
sets. However, in this case there is also a discrepancy between the theoretically 
predicted and computationally observed values. 

In this paper we take a different approach and ask whether it is possible 
to accurately predict the (observed) required level of catalysis in the template- 
based version of the model given the (observed) values in the original, completely 
random version of the model (and vice versa). At first, this may not seem obvi- 
ous, as the two versions of the model have quite different constraints on which 
molecules can/will catalyze which reactions. However, we derive an analytical 
calculation that allows us to make this prediction to a very high degree of accu- 
racy. Therefore, we can conclude that, as far as the probability of the existence 
of self-sustaining autocatalytic sets is concerned, the more realistic template- 
based version of the model is (mathematically) no different from the simpler 
original (completely random) version of the model. 

The outline of this paper is as follows. In the next section, we briefly review 
the mathematical CRS model, in particular the relevant notation and defini- 
tions, and highlight the differences between the two versions of the model. In 
Section [3] we derive, step by step, an analytical calculation that allows us to 
directly relate the required level of catalysis in both model versions. In Section 
|4] we then show the close match between the analytically predicted and actu- 
ally observed required level of catalysis in the template-based model. Finally, 
Section [5] summarizes the main results and conclusions. 

2. A model of catalytic reaction systems 

Consider a catalytic reaction system (CRS) Q — {X,TZ, C), where X is a set 
of molecule types x, 7?, is a set of reactions r (converting reactants into products) 
and C is a catalysis set, i.e., a set of molecule-reaction pairs {x,r) indicating 
that molecule x can catalyze reaction r. We also include the notion of a food 
set F G X, which is assumed to contain molecule types that are freely available 
in the environment. See [6l I16j for a detailed definition. 

A particular CRS model that was introduced in [H [9] , here referred to as 
the binary polymer model, consists of; 



2 



• A set of molecule types represented by bit strings up to a given length n, 
i.e., X = {0,1}^"; 

• A food set consisting of bit strings up to a given (small) length t < n, i.e., 
F = {0,1}^*; 

• Two types of reactions: (1) ligation, which "glues" two smaller bit strings 
together into one larger bit string, and (2) cleavage, which breaks a larger 
bit string into two smaller ones. The reaction set TZ consists of all such 
reactions that are possible within the constraint of the maximum molecule 
length n. 

• Randomly assigned catalysis, where each possible molecule-reaction pair 
(x, r) is independently assigned to the set C with a given probability p(n) 
(this probabilistic assignment is done once at the model instantiation; each 
such instantiation thus gives rise to a different set C). 

Thus, n, t, and p{n) are the model parameters. 

The main question that was studied in with this model is: under which 
conditions (model parameter values) is there a high probability of an autocat- 
alytic set existing within a full CRS? An autocatalytic set can be (informally) 
described as a subset of reactions TZ' CTI (and the molecules involved in these 
reactions) in which (1) each reaction r G TZ' is catalyzed by at least one molecule 
that is either in the food set or can be produced from it by repeated application 
of reactions only from TZ' , and (2) all reactants of the reactions r G TZ' are 
either in the food set or can be produced from it by reactions from TZ' only (a 
simple example is provided in Fig. [ij^i)). For a formal definition, see [6], where 
we used the term RAF (reflexively autocatalytic and food-generated) for such 
autocatalytic (sub)sets. 

Note that this CRS and RAF formalism is similar, or at least related, to 
alternative models in the context of the origin of life, such as petri nets [T4] . 
(M,R) systems [El [HI [7], the chemoton model [3], other artificial chemistries 
and topological approaches , and several other frameworks (see also [U [TU] for 
a more complete overview and comparison). However, what most of these other 
formalisms seem to be missing, is some way of actually finding or identifying 
autocatalytic sets within a larger catalytic reaction system, and a thorough 
analysis of the probabilities of finding such subsets under different conditions 
(model parameters). 

In [5] , we introduced an efficient algorithm to find RAF sets in any (general) 
CRS, and applied it to instances of the binary polymer model. We showed both 
computationally [H] and theoretically [inj that a linear growth rate (with n, the 
maximum molecule size) in the level of catalysis (in terms of the average number 
of reactions catalyzed per molecule) is sufficient to achieve a high probability 
Pn of autocatalytic sets occurring in instances of the random binary polymer 
CRS model. 

In [S], we then analyzed a chemically more realistic version of the binary 
polymer model: template-based catalysis. This differs from the original model 
in the way molecule-reaction pairs (x, r) are included in the catalysis set C: 



3 



• Each possible molecule-reaction pair (x, r) is considered for inclusion in the 
set C with a given probability p'{n), but is only allowed to be included if 
the molecule x, somewhere along its length, matches the reaction template 
of reaction r. 

The reaction template, following [S], is made up of the two bits on either side 
of the reaction site, or four bits in total (in previous studies, the complement 
of the reaction template was actually used; however, given that the model uses 
bit strings, this does not make a difference in terms of the mathematics due 
to the inherent symmetry). So, for example, in the ligation reaction 000 + 
mil 00011111, the reaction template is 0011, i.e., the last two bits of the 
first reactant plus the first two bits of the second reactant (another example is 
provided in Fig. [l]jii)). For this template-based catalysis version of the model, 
we also showed that a linear growth rate (with n) in the level of catalysis suffices 
for autocatalytic sets to appear with high probability (although with a higher 
coefficient, or slope, in the linear relation than for the original model). 




OHIO 



0111010 



a b 



(l) (11) 

Figure 1: (i) An example of a simple CRS in which a, b and c constitute the food set, catalysis 
is indicated by dashed lines, and where the reactions TZ' = {ri,r2} (shown in bold) form an 
RAF set (each reaction in TZ' is catalyzed by at least one molecule that is either in the food set 
or can be produced from it by repeated application of reactions only from Ti' , and all reactants 
of the reactions in TZ' are either in the food set or can be produced from it by reactions from 
TZ' only). This is the only RAF present in this CRS. (ii) In the original polymer model (M), 
either one of the potential catalysts shown (110011 and OHIO) has the same probability p 
of catalyzing the ligation reaction 10110 -I- 0111010 101100111010; in the template-based 
model (TM), only 110011 contains a matching template (namely *1001*) for this reaction. 
In order for the two models to have equal probabilities of containing an RAF, the probability 
p' that a template-matching molecule catalyzes a reaction needs to be elevated in the TM 
model (indicated by the bold dashed line) by a factor m (relative to p in model M) that can 
be calculated analytically. 

So, in the original (template-free) random catalysis version of the model, 
we have Pr[(x,r) G C] = p{n). Let m{n) be the probability that an arbitrary 
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molecule (of length at most n) matches the four-bit reaction template of an 
arbitrary reaction. Then Pr[(x,r) S C] = p'{n) ■ m(n) in the template-based 
version of the model. This suggest that taking p'{n) = p{n)/m{n) might lead 
to a similar probability P„ of finding autocatalytic sets in the template-based 
model as with p{n) in the original model. We will show that this heuristic 
estimate turns out to be extremely accurate, which is not clear a priori, as 
the two models are quite different (in the original model molecules catalyze 
each reactions with identical probabilities, while in the template-based model, 
these probabilities are highly heterogeneous). So, to predict the required level 
of catalysis p' {n) in the template-based version of the model, given p{n) and a 
particular value of P„, we need to know the probability m{n). 

3. The molecule-reaction template match probability 

In this section, we derive a precise mathematical procedure for calculating 
the molecule-reaction template match probability m{n) analytically. This pro- 
cedure consists of several steps, which will be explained in detail. We then show 
the results of applying the derived procedure to calculate the actual template 
match probabilities. 

3.1. The number oj substring matches 

As the first step, we obtain a procedure to calculate analytically the number 
/s (n) of bit strings of a given length n that contain a given substring s of given 
length. We will describe this in detail for a substring s of length four, but the 
procedure is applicable for any substring length. In the context of our CRS 
model, fs (ri) is equivalent to the number of molecules of a given length n that 
match a given four-bit reaction template s. 

We apply a mathematical technique, known as the transfer-matrix method 
[T5| . which calculates the number fs{n) of bit strings of length n that do not 
contain a given substring s. This is actually a very general method with a wide 
range of applicability, but here it will be explained explicitly in the context of 
counting bit strings of length n that contain a given substring of length four. 

To apply this method, we first need to construct a state transition graph 
Gs that can generate all bit strings (of any length) that do not contain a given 
substring s of length four. For this purpose, it is instructive to start with a state 
transition graph G that generates all possible bit strings. This graph G is shown 
in Fig. |2j and can be interpreted as follows. Each node represents one of eight 
possible states, with a state being defined (and labeled) by the last three bits 
that were generated at any given time step. For example, the bottom-left node 
labeled 000 represents a state where the last three generated bits were all zero. 
The bit that will be generated at the next time step can either be another zero 
(a state transition represented by the arrow that loops back to the same node), 
or a one (a state transition represented by the arrow that points to the node 
labeled 001). Following allowed state transitions (arrows) this way, all possible 
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Figure 2: The state transition graph G that generates all possible bit strings. 



bit strings can be generated. This state transition graph G can be represented 
mathematically by its adjacency matrix A: 
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where there is a 1 in position if there is an arrow in the corresponding 

graph G from state i to state j; otherwise, there is a 0. 

From this general graph G (and matrix ^4), it is now straightforward to con- 
struct the corresponding graph Gs (and matrix As) that generates all bit strings 
that do not contain a given substring s of length four. For example, consider 
s = 0010. In this case, in the graph G of Fig. [2] making the transition from 
node 001 to node 010 would not be allowed. Equivalently, the entry (001,010) 
in the adjacency matrix A should be set to zero. The resulting graph Gqoio 
and adjacency matrix ^ooio are shown in Fig. [sjand Eqn. ([2]), respectively (the 
changed entry in the adjacency matrix is shown in bold). 
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Figure 3: The state transition graph GoolO that generates all bit strings that do not contain 
the substring 0010. 



Similarly, for each of the other 15 possible teraplates s of length four, there is a 
state transition graph Gs where one of the 16 arrows of the original graph G is 
left out, and a corresponding adjacency matrix where one of the 16 ones in 
matrix A is set to zero. 

Now, according to the transfer-matrix method [H] , the number fs (n) of bit 
strings of length n that do not contain a given substring s of length four can be 
obtained by first computing the following generating function: 

F,{X)=Y^ {I^XAs)-; (3) 

and then takiiig the coefficient of A"~^ in Fs{X) as the value for fs{n)- Note 
that, in Eqn. / refers to the 8x8 identity matrix, and (/ — XAs)^j^ is the 
row i and column j entry of the inverse of the matrix / — XAg . 

Given that there are 2" bit strings of length n, the number fs{n) of bit 
strings of length n that contain a given substring s of length four is now simply: 

fs{n) = 2^-Ts{n). (4) 

3.2. The template match probability 

As the next step, using the results of Eqn. Q, the average (or expected) 
number g{n) of bit strings of length n that contain an arbitrary substring of 
length four, is now easily calculated as 

From this, the probability h{n) that a bit string of length at most n contains 
an arbitrary substring of length four, is then calculated as 

Hn) = (6) 
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where 2"+-'^ — 2 is the total number of bit strings of length at most n. In practice, 
the summation needs only run from i = 4 to n, as any bit string of length n < 4 
will obviously not contain a substring of length four (i.e., g{n) = for n < 4). 

In the context of our CRS model, this probability h{n) can be taken as 
the probability that an arbitrary molecule up to length n matches an arbitrary 
four-bit reaction template. 

3.3. The fraction of reactions with a four-bit reaction template 

So far, we have assumed that there actually is a substring of length four to 
be contained (or not) in an arbitrary bit string (or, in our context, a four-bit 
reaction template to be matched by a molecule). However, in the template- 
based catalysis version of our model, this is not always the case. In particular, 
all reactions that have at least one molecule of length one as a reactant (or 
product, in case of a ligation), do not have a valid four-bit reaction template, 
given that wc require two bits on each side of the reaction site to make up a four- 
bit template. However, these reactions can still be considered (with probability 
p'{n)), along with a molecule x, for inclusion in the catalysis set C (which, of 
course, will never actually be "approved" , given that they do not have a valid 
four-bit template). 

So, as the final step, we need to calculate the fraction k{n) of reactions 
that actually do have a four-bit reaction template, in order to "discount" for 
those reactions that might be considered, but which do not have a valid reaction 
template. A straightforward counting argument will provide us with a simple 
expression for this fraction k{n). 

First, recall that the total number of reactions, for a given n, is (n — 
2)2"+^ + 4. Next, we need to count the number of reactions that have at 
least one molecule of length one as a reactant (since reactions are considered 
bi-directional, we do not need to consider cleavage reactions separately). A 
molecule of length one can react (ligate) with any other molecule of length at 
most n — 1 (otherwise, the maximum molecule length n would be violated). 
There are two molecules of length one, which can be either the first or the sec- 
ond reactant (i.e., four combinations) and there are 2" — 2 molecules of length 
at most n — 1, so we have 4(2" — 2) — 4 reactions with at least one molecule 
of length one as a reactant (the '—4' at the end is because otherwise we would 
double-count the four possible reactions where both reactants are of length one). 
So, the fraction k{n) of reactions without a valid four-bit reaction template is: 

- _ 4(2" - 2) - 4 _ 2"+2 - 12 _ 2"+2 _ 2 

~ (n-2)2"+i +4 ^ (n-2)2«+i +4 ~ (n - 2)2«+i ~ ^T^' ^ ^ 

Consequently, the fraction k(n) of reactions with a valid four-bit reaction tem- 
plate is 

Mn) = l-Mn) = l-^ = ^. (8) 
Note that this is an approximation, and the exact number is actually equal to: 

fc(„) = ''~^+^ . (9) 
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However, even for relatively small values of n (10 or higher suffices for our 
purposes), the difference between the exact and simpler approximate values 
becomes negligible. 

Now we can finally put everything together and calculate the probability 
m{n) that an arbitrarily chosen molecule of length at most n will match the four- 
bit reaction template of an arbitrarily chosen reaction (including the possibly 
that a reaction does not have a valid reaction template, in which case there is 
no match), by combining Eqns. ^ and ([8|: 

m{n) ^ k{n) ■ h{n) = ^^^—^h{n). (10) 

3.4- Results 

We used the software package wxMaxima pj, which is capable of performing 
symbolic computation and is available for free under the GPL license, to com- 
pute the generating functions Fs{X) (Eqn. ([s])) and the corresponding values of 
/s(n) (Eqn. Q). For example, for s = 0010 (using the adjacency matrix Aqoio 
in Eqn. we get: 

_ 4A3-2A^-A + 8 

-A4 + A3-2A + 1- ^^^^ 

After applying a Taylor expansion, this yields: 

^ooio(A) 8 + 15A + 28A2 + 52X^ + giX'^ + 181X^ + ... (12) 



So, for example, /ooio(5) = 28, i.e., the coefficient of A'"'^'^ = A^ in _Fooio('^)- 
From this, it can be easily calculated that /ooio(5) = 2^ — /ooio(5) = 32 — 28 = 4, 
i.e., four bit strings of length n = 5 contain the substring s = 0010. The results 
for all templates and 4 < n < 20 are given in Table [T] Note that because of the 
inherent symmetry, we only show values for the eight templates that start with 
a 0. To find the value for a template s that starts with a 1, simply look up the 
value for its (binary) complement s. For example, /ioio(12) = /oioi(12) = 1731. 
Notice also that other symmetries exists ~ in particular, /„(s) is identical to 
/„(s') when s' is template s in reverse order. This and further symmetries 
ensure that the number of distinct columns in Table [T] is only four, rather than 
the eight possible. 

Using the values from Table [l] we can compute g{n) (using Eqn. ([5])), 
then h{n) (using Eqn. ([6])) and k{n) (using Eqn. ([S])), and, finally, m{n) 
(using Eqn. (10)). The results of these calculations are shown in Table [2] for 



4 < n < 20. The (analytically) calculated values for /s(n) and g{n) were verified 
by a small computer program we wrote to explicitly enumerate all possible bit 
string/template combinations and counting the number of matches. However, 
for larger values of n, this would obviously be intractable, whereas the analytical 
(transfer-matrix) method will still be feasible. 
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Table 1: Values for fs{n) for all eight possible four-bit templates that start with a 0, and for 
4 < n < 20, as calculated using the transfer-matrix method. By symmetry, we only need to 
describe templates that start with 0. Other symmetries are also apparent in the table: notice 
that the three columns headed by a bold template are identical, and the three columns headed 
by an underlined template are also identical. 

4. Predicting the required level of catalysis 

Table [2] provides us with the molecule-reaction template match probabilities 
m(n) which can be used to predict the required level of p'{n) (the probability of 
considering a molecule-reaction pair for inclusion in the catalysis set C), given 
p{n) and a probability P„ of observing autocatalytic sets. 

To test these predictions, we performed computer simulations with the two 
versions of the binary polymer model, using our RAF algorithm to detect au- 
tocatalytic sets in instances of these CRS models. For various values of n, we 
identified the required levels of catalysis pin) (random catalysis model) and 
p'{n) (template-based model) for which there was a probability Pn = 0.5 of 
finding RAF setfQ In the template-based catalysis version of the model, we 
used reaction templates of four bits (two bits on each side of the reaction site) 
that have to be matched completely by a candidate catalyst (no partial matches 
allowed). We used a value of t = 4 for the food set (i.e., molecules up to length 
four) to allow at least some food molecules to also be catalysts. For each com- 
bination of n and p{n) (or p'{n)), we generated between 500 (for larger n) and 
10,000 (for smaller n) instances of the binary polymer model, and applied our 
RAF algorithm to count the fraction (P„) of these instances that contain an 



^In previous papers, we have already shown that Pn = 0.5 provides a useful reference point. 



10 









h{n) 




kin)* 




7n 1 n 1 
1 1 t\i I J 


A 


1, 


• UUU 


n 


.uooooo 




u. 




n 

u 


.uu / o^o 


O 


o 
o. 


.O ID 


n 


n7Sfi9Q 
.u t ouzy 


u. 




n 

u 




o 


1 1 
i i , 


.0 10 


n 


1 9SQfiS 




u. 


^971 "^9 


n 

u 


0(^70^*^ 
.uu / yoo 


7 




R9^ 


n 


1 SOfil n 
.louuiu 


n 

u. 


,uiuoyz 


n 

u 


1 1 0970 


Q 
O 


79 


,ZOU 


n 


9'?1 fil S 

.^O lUi-O 




u. 


fi7i nni 

,U ( lUUl 


n 

u 


. lOO^lU 


Q 


168, 


,625 


n 


980^77 




u. 


71 4fiQ^ 


n 

u 


900^97 


10 


382, 


,375 


0, 


.327041 


0, 


,750000 


0, 


.245281 


11 


849, 


,000 


0, 


.370817 


0, 


,777778 


0, 


.288413 


12 


1855, 


,125 


0, 


.411874 


0, 


,800000 


0, 


.329499 


13 


4002, 


,875 


0, 


.450258 


0, 


,818182 


0, 


.368393 


14 


8551, 


,000 


0, 


.486087 


0, 


,833333 


0, 


.405072 


15 


18118, 


,000 


0, 


.519503 


0, 


,846153 


0, 


.439579 


16 


38129, 


,250 


0, 


.550655 


0, 


,857142 


0, 


.471990 


17 


79787, 


,000 


0, 


.579691 


0, 


,866667 


0, 


.502399 


18 


166151, 


,625 


0, 


.606755 


0, 


,875000 


0, 


.530911 


19 


344567, 


,000 


0, 


.631982 


0, 


,882353 


0, 


.557631 


20 


712003, 


,750 


0, 


.655501 


0, 


,888889 


0, 


.582668 



Table 2: The analytically calculated values for g{n), h{n), k(n), and m(n), for 4 < n < 20. 
*For the calculation of k{n), we used the exact formula for n < 10 and the approximation for 
n > 10. For n = 10, the two values differ by only 0.1%. 

RAF set. Due to the exponential growth in the size of the reaction set TZ with 
increasing n, we went up to rt = 18 only, which already took many weeks of 
computing time, even on a large parallel computer cluster. 

With the simulation results and the analytically calculated values of m{n) 
from Table [2) we can finally compare the predicted values p{n)/m{n) with the 
observed values p'{n), which is shown in Fig. |4j As this figure shows, there is 
a very close match between the predicted and observed values, with increasing 
accuracy for increasing values of n. Figure [5] shows the same results on a log 
scale, to show the results (and increasingly close fit) more clearly for larger 
values of n. This confirms our proposition that the required levels of catalysis 
in both model versions can be accurately predicted from each other using the 
molecule-reaction template match probabilities m(n). 

Next, we performed the same comparison for a fixed value of n but varied 
the probability P„ of finding RAF sets. For two given maximum molecule 
lengths (n = 12 and n = 15), we identified the values of p(n) and p'{n,) to get 
probabilities of finding RAF sets of P„ G {0.5, 0.6, 0.7, 0.8, 0.9}. Figure [6] shows 
the predicted values p{n)/m{n) and the observed values p'{n) for the template- 
based version of the model. Again, there is a close agreement (for n — 12, the 
difference is about 2%; for n = 15, merely 0.3%). So, the relationship also seems 
to hold for different values of P„. 

With these results, we can conclude to have established a direct relationship 
between two structurally distinct versions of the binary polymer model. More 
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Figure 4: Comparison of the predicted {p{n)/m{n); solid line) and observed (p'(n); dotted 
line) values of the required level of catalysis in the template-based version of the binary 
polymer CRS model, for various values of n. 

precisely, we have shown how these two models lead to identical predictions 
concerning the probability P„ of finding autocatalytic sets, when the probabili- 
ties of catalysis are related by the m{n) scaling factor, which can be calculated 
analytically. 

5. Conclusions and discussion 

We have compared two different versions of a well-known binary polymer 
model of catalytic reaction systems as introduced in |51 H] and investigated in 
detail in our own previous work [S] IHl 1121 116j . In the first version of the model, 
the probability p that an arbitrary molecule will catalyze an arbitrary reaction 
is equal and independent for each possible molecule-reaction pair. In the second 
version, a molecule will catalyze a reaction with similar probability p' only if it 
matches the four-bit template around the reaction site. 

At first glance, these two versions seem to impose rather different constraints, 
resulting in rather different (actual) molecule-reaction catalysis sets C. How- 
ever, we have shown that it is possible to calculate analytically a factor m which 
directly (and accurately) relates the catalysis probabilities p and p' in terms of 
the probabilities P„ of finding autocatalytic sets in instances of both model ver- 
sions. In other words, given the required level of catalysis in one model version 
to find autocatalytic sets with a given probability P„ , it is possible to accurately 
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Figure 5: Comparison of the predicted {p(n)/m(n); solid line) and observed (p'(n); dotted 
line) values of the required level of catalysis in the template-based version of the binary 
polymer CRS model, for various n, using a log scale. 

predict the required level of catalysis in the other model version to get the same 
probability P„ of finding autocatalytic sets. 

Because of this mathematical invariance between the two model versions, we 
can conclude that results obtained with the original (purely random and less 
realistic) model can still be considered relevant in the more realistic context of 
template-based catalysis. We have shown that the mathematical relationship 
holds for various n (with fixed P„) as well as for various P„ (with fixed n), 
with increasing accuracy for increasing n, so it appears to be a very robust 
relationship. Furthermore, the analytical method for calculating the scaling 
factors rn(n) can be easily generalized to, for example, non-binary molecules or 
different template lengths, and therefore does not depend on specific values of 
the model parameters. 

This result clearly (and substantially) counters one potential and important 
objection that the binary polymer model in its original form (with purely ran- 
domly assigned catalysis) is too simplistic. We already showed earlier that it is 
straightforward to add more realistic extensions to the original model, such as 
template-based catalysis, which still results in similar behavior (in particular, a 
linear growth rate in the level of catalysis is sufficient for a high probability of 
finding RAF sets) [SI. In the current paper, we established a different, more di- 
rect and analytical relationship between the required levels of catalysis in these 
two model versions. Furthermore, it is interesting to note that recently, an in- 
vitro "implementation" of the binary polymer model was realized [17] , where the 
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Figure 6: Comparison of the predicted {p(n)/m(n); solid line) and observed (p'(n); dotted 
line) values of the required level of catalysis in the template-based version of the binary 
polymer CRS model, for various values of P„. 



food set consists of dinucleotides such as CA and TG, which can be interpreted 
as Os and Is in the binary model J18^. The authors conchide that "different 
building blocks can be incorporated into the strand, promoting the formation of 
combinatorial libraries of oligonucleotides long enough to be folded into specific 
catalytically active structures and to potentially form initial autocatalytic sets" 
[17j . This is a further indication that the binary polymer model (and its re- 
sults) has direct (bio)chemical relevance, including in the context of origin of 
life studies. 
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